This is an R Markdown Notebook
Deep Learning… This lecture is dedicated to the implementation of Deep Learning models into for our Data
The idea will be to use one particular feature e.g. Tubing Process, Impedance to play with. We will not yet implement this in Shiny but rather focus to prepare our data, fit and test the model
Let’s get our time - series data as a dataframe first…
library(tidyverse)
# ============= READ DATA =================
# Read our small data ...
DF_Data_Recent <- readRDS("DF_Data_Process_Recent.data")
DF_Equipm <- read_csv("DF_EquipmData.csv")
# data frame containing Event Names
DF_EvCode <- read_csv("DF_EvCodeDataProject.csv")
# Data manipulation and saving to the DF_TEMP
DF_TEMP <- DF_Data_Recent %>%
# join to decode equipment serial number
inner_join(DF_Equipm, by = "IDEquipment") %>%
# join to decode Event Code meaning
inner_join(DF_EvCode, by = "EventCode") %>%
# select only column needed
select(StartDate, Name, AnalogVal, EventText)
Then I will plot my data for all 4 machines as just to remember how it looks like…
# creating human readable data and visualize them
DF_TEMP %>%
filter(EventText == "Tubing Process, resistance Ohm") %>%
ggplot(aes(x = StartDate, y = AnalogVal, col = Name)) + geom_point()+facet_grid(~Name)
Looking on the chart above I can see that machine #1 seems to be the best. I will take that one as a reference to build my Deep Learning Model. I will be extracting this dataset with a filter() function
# extracting only one machine
DF_M1 <- DF_TEMP %>%
filter(EventText == "Tubing Process, resistance Ohm") %>%
filter(Name == "Machine #3") %>%
select(StartDate, AnalogVal) %>%
head(50)
Now we need to transpose our data from long to wide structure!
# transposing our dataframe
DF_t <- as.data.frame(t(as.matrix(DF_M1)))
head(DF_t)
## 1 2 3
## StartDate 2017-09-03 17:44:43 2017-09-03 17:44:46 2017-09-03 17:44:53
## AnalogVal 51 45 47
## 4 5 6
## StartDate 2017-09-03 17:47:41 2017-09-03 17:46:19 2017-09-03 17:46:18
## AnalogVal 46 46 53
## 7 8 9
## StartDate 2017-09-03 17:47:22 2017-09-03 17:47:18 2017-09-03 17:47:40
## AnalogVal 46 52 52
## 10 11 12
## StartDate 2017-09-03 17:48:58 2017-09-03 17:49:53 2017-09-03 17:48:25
## AnalogVal 51 47 45
## 13 14 15
## StartDate 2017-09-03 17:48:20 2017-09-03 17:48:59 2017-09-03 17:54:53
## AnalogVal 52 45 46
## 16 17 18
## StartDate 2017-09-03 17:59:53 2017-09-03 18:04:53 2017-09-03 18:05:09
## AnalogVal 47 49 43
## 19 20 21
## StartDate 2017-09-03 18:07:01 2017-09-03 18:15:53 2017-09-03 18:18:59
## AnalogVal 49 45 45
## 22 23 24
## StartDate 2017-09-03 18:29:53 2017-09-03 18:39:53 2017-09-03 18:09:53
## AnalogVal 45 46 46
## 25 26 27
## StartDate 2017-09-03 18:34:53 2017-09-03 18:31:41 2017-09-03 18:32:58
## AnalogVal 45 45 45
## 28 29 30
## StartDate 2017-09-03 18:24:53 2017-09-03 18:15:51 2017-09-03 18:18:04
## AnalogVal 45 51 45
## 31 32 33
## StartDate 2017-09-03 18:18:58 2017-09-03 18:32:54 2017-09-03 18:31:40
## AnalogVal 51 51 52
## 34 35 36
## StartDate 2017-09-03 18:19:53 2017-09-03 18:18:01 2017-09-03 18:14:53
## AnalogVal 49 51 45
## 37 38 39
## StartDate 2017-09-03 18:44:53 2017-09-03 18:47:39 2017-09-03 18:48:45
## AnalogVal 45 45 45
## 40 41 42
## StartDate 2017-09-03 19:00:13 2017-09-03 19:00:12 2017-09-03 19:19:53
## AnalogVal 45 52 45
## 43 44 45
## StartDate 2017-09-03 19:09:53 2017-09-03 18:45:55 2017-09-03 18:47:11
## AnalogVal 45 45 51
## 46 47 48
## StartDate 2017-09-03 18:47:37 2017-09-03 18:47:13 2017-09-03 18:45:54
## AnalogVal 51 45 51
## 49 50
## StartDate 2017-09-03 18:43:52 2017-09-03 18:59:53
## AnalogVal 52 45
Now, we can actually ‘forget’ the StartDate values and simply parse the values into matrix of dimension say 200 columns and 20 rows… Of course one need to recover some basic R skills for that :) if not use stackoverflow… (How to turn a vector into a matrix in R?)[https://stackoverflow.com/questions/14614946/how-to-turn-a-vector-into-a-matrix-in-r]
DF_M1 <- DF_TEMP %>%
filter(EventText == "Tubing Process, resistance Ohm") %>%
filter(Name == "Machine #1") %>%
select(AnalogVal) %>%
t() %>% # this brings us a matrix
as.vector() %>% # let's make it a vector
head(3000) %>% # only making fixed amount of rows
matrix(nrow = 20, ncol = 150) # transforming that into matrix size 25x200
Wonderful, let’s try to see! our new object as an image!!!
Let’s use plotly 3D graph to explore what we have got!
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_ly(z = DF_M1, type = "surface")
Machine 1 used for Training
This should be good enough to fit our deep learning model, but before we will do so I will ‘prepare’ my test datasets
A. For machine 2:
DF_M2 <- DF_TEMP %>%
filter(EventText == "Tubing Process, resistance Ohm") %>%
filter(Name == "Machine #2") %>%
select(AnalogVal) %>%
t() %>% # this brings us a matrix
as.vector() %>% # let's make it a vector
head(1500) %>% # only making fixed amount of rows
matrix(nrow = 10, ncol = 150) # transforming that into matrix size 25x200
B. For machine 3:
DF_M3 <- DF_TEMP %>%
filter(EventText == "Tubing Process, resistance Ohm") %>%
filter(Name == "Machine #3") %>%
select(AnalogVal) %>%
t() %>% # this brings us a matrix
as.vector() %>% # let's make it a vector
head(3000) %>% # only making fixed amount of rows
matrix(nrow = 20, ncol = 150) # transforming that into matrix size 25x200
And we can also visualize that to confront:
For Machine 2
plot_ly(z = DF_M2, type = "surface")
For Machine 3
plot_ly(z = DF_M3, type = "surface")
Launch the machine again…
# to load the library
library(h2o)
# to initialize the 'machine'
localH2O = h2o.init()
##
## H2O is not running yet, starting it now...
##
## Note: In case of errors look at the following log files:
## C:\Users\fxtrams\AppData\Local\Temp\RtmpqK2I0m/h2o_fxtrams_started_from_r.out
## C:\Users\fxtrams\AppData\Local\Temp\RtmpqK2I0m/h2o_fxtrams_started_from_r.err
##
##
## Starting H2O JVM and connecting: . Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 2 seconds 99 milliseconds
## H2O cluster version: 3.14.0.7
## H2O cluster version age: 22 days
## H2O cluster name: H2O_started_from_R_fxtrams_aho232
## H2O cluster total nodes: 1
## H2O cluster total memory: 1.77 GB
## H2O cluster total cores: 4
## H2O cluster allowed cores: 4
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## H2O API Extensions: Algos, AutoML, Core V3, Core V4
## R Version: R version 3.2.5 (2016-04-14)
Then we will download the datasets into h2o. Remember H2O is just operated from R but it’s a computer besides!
# Import train data into the H2O cluster
train_M1 <- as.h2o(x = DF_M1, destination_frame = "train_M1")
##
|
| | 0%
|
|=================================================================| 100%
# Also import our test datasets for Machines 2 and 3...
test_M2 <- as.h2o(x = DF_M2, destination_frame = "test_M2")
##
|
| | 0%
|
|=================================================================| 100%
test_M3 <- as.h2o(x = DF_M3, destination_frame = "test_M3")
##
|
| | 0%
|
|=================================================================| 100%
Now, once we know how our data looks like we can start to do our Anomaly Model
# Train deep autoencoder learning model on "normal"
# training data, y ignored
normality_model <- h2o.deeplearning(
x = names(train_M1),
training_frame = train_M1,
activation = "Tanh",
autoencoder = TRUE,
hidden = c(50,20,50),
sparse = TRUE,
l1 = 1e-4,
epochs = 100)
##
|
| | 0%
|
|=================================================================| 100%
Let’s use this model on our training dataset…
# computer error of the model
h2o.anomaly(normality_model, train_M1) %>% as.data.frame() %>% plot.ts(ylim = c(0, 2))
# visually see it
test_recon_M1 <- h2o.predict(normality_model, train_M1) %>% as.matrix()
##
|
| | 0%
|
|=================================================================| 100%
plot_ly(z = test_recon_M1, type = "surface")
Now we will try to use test datasets from machine 2
# Compute reconstruction error with the Anomaly
# detection app (MSE between output and input layers)
h2o.anomaly(normality_model, test_M2) %>% as.data.frame() %>% plot.ts(ylim = c(0, 10), type = "p")
And for machine 3
h2o.anomaly(normality_model, test_M3) %>% as.data.frame() %>% plot.ts(ylim = c(0, 10), type = "p")
It is now telling us that behaviour of Machine 2 is much different from machine 1. And on Machine 3 we have a clearly peaks that are distinguishable to recover anomaly!
Now we can obtain predictions, or physical values using our model. We should provide the model and test dataset
# Note: Testing = Reconstructing the test dataset
test_recon_M3 <- h2o.predict(normality_model, test_M3) %>% as.matrix()
##
|
| | 0%
|
|=================================================================| 100%
plot_ly(z = test_recon_M3, type = "surface")
we can not definitely see those peaks as clear now however let us be satisfied with the model result and think how to implement our Deep Learning Model in ShinyApp
To use our model in our ShinyApp we will save it…
if(!file.exists("www/tmp/normality_model.bin")){
h2o.saveModel(normality_model, "www/tmp/normality_model.bin")
h2o.download_pojo(normality_model, "www/tmp", get_jar = TRUE)
}
And let’s not forget to switch off our cluster!
h2o.shutdown(prompt= FALSE)
## [1] TRUE
In this example the Anomaly Detection model was able to output the anomaly in rows 21-23.
It learned on the pattern of many vectors and was able to distinguish the anomaly coming on new dataset
Practical use of this model can be to us function h2o.anomaly. In case the MSE value will be high - the anomaly is detected!
our next step will be to repeat the procedure but on our machine data.
example from: (https://dzone.com/articles/anomaly-detection-with-deep-learning-in-r-with-h2o)[https://dzone.com/articles/anomaly-detection-with-deep-learning-in-r-with-h2o]
More reading: (https://dzone.com/articles/the-basics-of-deep-learning-how-to-apply-it-to-pre?fromrel=true)[https://dzone.com/articles/the-basics-of-deep-learning-how-to-apply-it-to-pre?fromrel=true]
And: (https://shiring.github.io/machine_learning/2017/05/01/fraud)[https://shiring.github.io/machine_learning/2017/05/01/fraud]
paper: (https://arxiv.org/abs/1701.01887)[https://arxiv.org/abs/1701.01887] In this lecture we would explore the ‘technology’ on the sample and try to do this in 10 min lecture!
At the bottom of the document: